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Abstract — We consider a generalization of the multiple 
measurement vector (MMV) problem, where the measurement 
matrices are allowed to differ across measurements. This 
problem arises naturally when multiple measurements are 
taken over time, e.g., and the measurement modality (matrix) 
is time-varying. We derive probabilistic recovery guarantees 
showing that — under certain (mild) conditions on the mea- 
surement matrices — -norm minimization and a variant of 
orthogonal matching pursuit fail with a probability that decays 
exponentially in the number of measurements. This allows us 
to conclude that, perhaps surprisingly, recovery performance 
does not suffer from the individual measurements being taken 
through different measurement matrices. What is more, recov- 
ery performance typically benefits (significantly) from diversity 
in the measurement matrices; we specify conditions under 
which such improvements are obtained. These results continue 
to hold when the measurements are subject to (bounded) noise. 



I. Introduction 

An interesting generalization of the sparse signal recovery 
problem as studied, e.g., in [1], [2], [3], is the so-called 
multiple measurement vector (MMV) problem [4], [5], [6], 
[7]. Application areas of the MMV problem include neu- 
romagnetic imaging, array processing, and nonparametric 
spectral analysis of time series [4]. The MMV problem is 
formalized as follows: Given the vectors x'°), x( d_1 ), that 
share the sparsity pattern S, i.e., the entries of x(°>, x^ -1 ) 
are equal to zero on S, we want to recover the x' 1 ' from 
the noisy measurements yW = Ax' 1 ' +e' ! ',i = 0, ...,d— 1, 
where the eW are noise vectors and the measurement matrix 
A G l mXn is assumed known. For the noiseless case, i.e., 
eW = 0, for all i, it was shown in [5], [8] that the program 



(PO-MMV) 



{minimize |«S| 
subject to yW=AxW, i = 0,..,d-l 

recovers all x (0) , ...,x (d_1) with rank[x (0) ... x (d_1) ] = K if 
and only if 

spark(A) - 1 + K 



\S\ < 



(1) 



where spark(A) is the cardinality of the smallest set of 
linearly dependent columns of A [2] . The threshold (Q]) con- 
stitutes a potentially significant improvement over the well- 
known spark(A)/2-threshold [2] for the single measurement 
vector (SMV) case, i.e., for d — 1. Necessity of the threshold 
([TJ shows that when asking for recovery of all sets of vec- 
tors x(°), x^ -1 ), including linearly dependent collections, 
multiple measurements do not result in an improvement in 
the recovery threshold over the SMV case. It is therefore 
sensible to ask whether performance improvements can be 



expected for "typical" x^, x( d_1 ). Since (PO-MMV) is 
NP-hard [9], this question is usually posed with the proviso 
that computationally efficient algorithms such as £2/(1 -norm 
minimization or a variant of orthogonal matching pursuit 
(OMP) [5], [6], [7] should be used for recovery. Indeed, 
a corresponding probabilistic performance analysis carried 
out in [10], [11] shows that multiple measurements yield 
significant improvements in recovery performance over the 
SMV case. 

In practical applications the measurement matrix (modal- 
ity) often changes across measurements, e.g., when measure- 
ments are taken over time and the underlying measurement 
modality exhibits characteristics that vary over time. It is 
therefore natural to ask whether improvements thanks to mul- 
tiple measurements depend critically on the measurements all 
being taken through the same measurement matrix A. We an- 
swer this question by considering the following modification 
of the MMV problem, termed generalized MMV (GMMV) 
problem henceforth: Given the vectors x^°\ ...jX^" 1 ), that 
share the sparsity pattern £>, recover the x w from the 
(possibly noisy) measurements 

yW = A«x» +eW, i = 0,...,d-l (2) 

assuming knowledge of the measurement matrices AW g 
M mx ™. Here, the e^' are noise vectors. 

The GMMV problem also occurs in the recovery of sparse 
signals that lie in the union of shift-invariant subspaces [12], 
[13], as detailed in an extended version of this paper [14]. 

As the MMV problem is a special case of the GMMV 
problem, obtained by setting AW — A, for all i, it follows 
immediately that, for general A w , a worst-case (with respect 
to the xW) analysis reveals no improvements resulting from 
multiple measurements. 

Contributions: The main theme of this paper is a 
probabilistic (with respect to the x w ) performance analysis 
of an ^/^l-norm based recovery algorithm, called LOPT, 
and a variant of OMP, called MOMP, for deterministic 
measurement matrices AW. For the noiseless case, under 
very general conditions on the Ay\ we find that the failure 
probability of LOPT and MOMP decays exponentially in 
the number of measurements d. We show that, perhaps 
surprisingly, having different measurement matrices A^ 
can lead to (substantial) performance improvements over 
the MMV case A (0) = ... = A^ -1 ). What is more, 
these improvements are obtained under very mild "isometry" 
conditions on the AW. Furthermore, we show that our results 
continue to hold when the measurements are subject to 
bounded noise. 



The probabilistic model on the xW we use is more 
general than that employed in [10], [11] for the MMV case. 
Particularizing our results to the MMV case therefore yields 
generalizations of the main results in [10], [11]. For the noisy 
case our result for LOPT is new, even in the MMV case. 

We note that the GMMV problem can be cast as a block- 
sparse problem [15], which in turn is contained in the 
model-based [16] setting. However, formulating the GMMV 
problem as a block-sparse (or model-based) problem, and 
applying the corresponding recovery results available in the 
literature yields worst-case recovery conditions only. 

In terms of mathematical tools, we note that the proofs of 
our main results, provided in [14], consist of two steps. First, 
we derive conditions for LOPT and MOMP to succeed and 
then we use concentration of measure results to show that 
these conditions are satisfied with high probability, provided 
that mild conditions on the are satisfied. While the 

proofs in [10], [11] follow these two general steps as well, 
the technical specifics are quite different. Concretely, the 
more general probabilistic model for the xW requires the 
use of concentration of measure results that are more general 
than those employed in [10], [11]. In addition, our recovery 
conditions are new, and, in particular in the noisy case, non- 
trivial to derive. 

Notation: We use lowercase boldface letters to denote 
column vectors, e.g., x, and uppercase boldface letters to 
designate matrices, e.g., A. For a vector x, [x] q and x q 
denote the qt\\ entry. For the matrix A, A^ is its pseudo- 
inverse and ||A|| 2 _^ 2 := max|| v || 2=1 ||Av|| 2 its spectral 
norm. The superscript H stands for Hermitian transposition. 
For the set S, \S\ is its cardinality and S stands for its 
complement in {0, n— 1}. We say that a random variable x 
is standard Gaussian, if it is of zero-mean and unit variance; 
x is standard complex Gaussian if x = xr + jxi, where 
xr, xi are i.i.d. Gaussian with mean zero and variance 1/2. 

II. Problem formulation 

The formal statement of the problem we consider is as 
follows. Suppose we observe the m-dimensional vectors 



y W = A«x«+eW, i = 0,...,d-l 



(3) 



where the € R m account for (unknown) noise, the xW g 
K™, n > to, share the sparsity pattern S C {0, n— 1}, i.e., 
for each xW the entries with index in S are equal to zero, 
and the measurement matrices A^°\ A^" 1 ) e R mxn 
are known. We want to recover x* ),..,^- 1 ) from the 

y(0) ) ..^y^- 1 ). 

We first consider the noiseless case, i.e., eW = 0, for all 
i. Recovery can be accomplished by solving 



(PO-GMMV) 



{minimize |«S| 
subject to y« = A«x«, i = 0,...,d- 1 



which is, however, NP-hard [9]. Computationally efficient 
alternative recovery algorithms, with, however, weaker re- 
covery guarantees, are specified next. A convex relaxation 



(*) 



1/2 



of PO-GMMV is given by 

minimize J2i=o ( E i= o 

subject to yW = A^xW, i = 0,...,d - 1. 

Another alternative, which is an adaptation of OMP, and will 
be called MOMP, is defined as follows. MOMP iteratively 
builds up the joint support set of x^°\ ...,x( d_1 ). The algo- 
rithm is initialized by choosing the residuals in iteration as 
= yW.i = 0, d — 1, and the set of selected indices 



as Sq = 0. In the pth iteration (p > 1) we find the index 



l p — argmax 



d-l 

E 

i=0 



W ) r 



p-1 



and update the set of selected indices by setting S p = 5 p _iU 
{lp}- The residuals are updated according to 



P 



A w x w 



(i 



P^)y (i) , 



0,...,d- 1 



where is the matrix obtained from AW by selecting 



(i) 



a£(a£) 



t 



the columns with indices in S p and P s 
is the orthogonal projector onto the span of the columns in 
A^. Both LOPT and MOMP are trivial generalizations of 
corresponding algorithms for the MMV case [4], [5], [6], 
[7]. 

Proceeding to the noisy case, we assume that noise is 
bounded in the sense of 



E 

i=0 



3 W 



< e z 



(4) 



As exact recovery of the will, in general, no longer 
be possible, we will be content with ensuring that the 
estimates of the xW are "close" to the true xW . The recovery 
algorithms we analyze in the noisy case are MOMP and a 
convex program closely related to LOPT, namely 



d-l 

(POPT) minimize - ^ ||y w - A w x w 



i=0 

n-1 /d-l 

+7E Ei 

;=0 \i=0 



2 
2 

1/2 



which, for d = 1, is known as the lasso [17] in the statistics 
literature, and for d > 1, is a particular variant of the group 
lasso [18]. The first term in the cost function of POPT 
accounts for the recovery error and the second term enforces 
sparsity; the parameter 7 > controls the tradeoff between 
these two terms. 

III. Review of worst-case recovery results 

We briefly discuss worst-case recovery results for the 
GMMV problem. Formulating the GMMV problem as a 
block-sparse recovery problem and evaluating the corre- 
sponding recovery conditions in [15] yields the following 
proposition. 



Proposition 1: Let S be the sparsity pattern of 
xC'j ...jX^ -1 ) and assume that 



max 
l£S • 



max 
=o d-l 



[(A^)al 



«l 



< 1. 



Jd-i) 



(5) 



exactly 



Then, LOPT and MOMP recover x (0 >, 
from yW = A®xW, i = 0, d - 1. 

For the MMV case, Proposition Q] reduces to [5, Th. 3.1]. 
Condition (O can be viewed as the GMMV-equivalent of the 
SMV-exact recovery condition, a standard recovery condition 
for l\ -minimization and OMP [19]. 

An alternative recovery condition can be obtained by 
viewing the GMMV problem as separate SMV problems 
and requiring exact recovery for each of the resulting SMV 
problems. Following this route, based on the SMV exact 
recovery condition [19, Th. A], we get that l\ -minimization 
and OMP applied individually to yM = A^xW recover 
xS ', ...,x' d_1 ) correctly if 



max max 

l&S i=0,...,d-l 



< i. 



(6) 



This is a slightly weaker condition than (0. Hence Propo- 
sition [T] does not predict any improvement of using LOPT 
or MOMP over treating the recovery problem as individ- 
ual SMV problems (solved through £\ -minimization and/or 
OMP). 

IV. Main results 
We discuss the noiseless and the noisy case separately. 

A. Recovery in the noiseless case 

For the noiseless case the probabilistic model on the xy> is 



as follows: For a given support set S C {0, n- 



1}, we take 



the entries of the vectors Xg\...,x^ ^ to be independent 



sub-Gaussian [20]. 

Definition 1: A zero-mean random variable is p-sub- 
GaussiarQ, with p > 0, if its moment generating function 
satisfies 

E[e tx ] < e pt \ (7) 
Sub-Gaussian random variables contain Gaussian and all 
boundecd random variables as special cases. We start with 
our main result for LOPT in the noiseless case. 

Theorem I: Fix S C {0, ...,n — 1} with cardinality s := 



|<S|, and take the entries of x^ , x^° 
i.i.d. zero-mean p-sub-Gaussian with unit varianc^. Assume 



Id- 1 ) 



to be 



that the measurement matrices AS°\ 
satisfy 

1/2 



1 



d-1 



v 8=0 



(A«) f a« 



< a < 1, for all Z S (8) 



'Sub-Gaussian random variables are often equivalently defined through 
tail bounds or through moment bounds, see e.g. [20]. The definition we 
chose is the most convenient for our purposes. 

2 The random variable x is bounded if there exists an M > such that 
P[\x\ < M] = 1. 

3 This is w.l.o.g. as the entries of the can be scaled to account for 
non-unit variance. 



and 



max 



(A, 



«) T a« 



< 7, for all Z g S 



(9) 



for some 7 > 0, where a ; 
Then, for every ^ > satisfying max{l 
a 2 (l + 32ep), with probability at least 



denotes the Zth column of A^. 

32ep,a 2 } < £ 2 < 



1 — (n — s) exp 



„2\2 



2 11 e 2 / 9 2 7 2 o; 2 



s exp 



2\2 < 



0,...,d- 



2 11 e 2 p 2 J 
(10) 

- 1, recovers 



LOPT applied to = A«x«,i 
the correct solution x^°\ ...,x^ d_1 ). 

The main implication of Theorem Q] is that, provided <[8J 
(and ©) is satisfied, the probability that LOPT fails decays 
exponentially in the number of measurements d. This has 
been shown before for the MMV case under the assumption 
of i.i.d. Gaussian x^ [10, Th. 4.4]. 

The constants in the exponents of ( TTOb can be improved 
(significantly) for certain distributions. For example, when 
the entries of the x^ are i.i.d. standard Gaussian (note that 
a standard Gaussian is sub-Gaussian with p = 1/2), the 
recovery probability is at least [14] 



1 — (n — s) exp —d 



2 7 2 



s exp 



(id 

Improvements over worst-case results: First note that 7 
in (0 can be chosen arbitrarily, hence (0 is not restrictive. To 
see that the recovery condition (H) is weaker than the worst- 
case recovery condition (|6]l (recall that © implies ©), we 
simply note that 



> i=0 




Improvements due to different measurement matrices: 
Evaluating ([8]) for the MMV case yields 



(A s )* an < a < 1, for all I £ S. 



(12) 



Note that ( [12] ) is the recovery condition stated in [10, Th. 

(i) 

4.4] and applying to the case where the entries of the x^ 
are i.i.d. Gaussian. Comparing ( [12] ) to (O, we see that in 
the GMMV case the measurement matrices have to satisfy 



(A«) T af) 



< a 2 only on average (i.e., across i). This 



essentially says that having different measurement matrices 
allows for some of them to be "bad" as long as the collection 
{A^ - 1 , A^ -1 )} is good enough on average. In contrast, 
in the MMV case, the single measurement matrix A has to 
be "good" in the sense of ( fl2b . 

This can be nicely illustrated by way of an example. 
Suppose we are given a measurement matrix A which does 
not satisfy O for all S C {0, ...,n- 1} with \S\ < k, for 
a given k, but does so on average over those S. Now, take 
the matrices AW, A^ 1 ' to be obtained independently 



by permuting the columns of A. Then, if d is sufficiently 
large, with high probability dS) will be satisfied for all S 
with \S\ < k. 

We next state our recovery results for MOMP and start 
by defining the following quantities, which are used to for- 
mulate "local" (i.e., pertaining to the (given) set S) isometry 
conditions. These quantities were also used in [10], [11] in 
the performance analysis of MOMP for the MMV case. 

For a given set S C {0, n — 1}, let 



S t (S) - 

Observe that 

(l-^llxsll^ 
for all x s el s . Define 



(A«) ff A« 



2^2 



A«x 5 



< (l + ^(5))||x 5 |! 



Mi (S) 



max 



max 



(A 



(i) 



, max 
, les 



H 



« 



and let S max (S) = maxj Si(S) and /i max (5) = max, fXi(S). 

Theorem 2: Fix S C {0, n — 1} with cardinality s := 
|iS|, let the measurement matrices A (0) , A (<i_1) € R mxn 
have unit norm columns with (i max (S) < 1 and S max (S) < 



1, and let the entries of x 



(0) 



mean p-sub-Gaussian with unit variance. If 

2 

(1- 



yd-l f 



MS) 



M?(g) ^ 
l-«5 l (5) / 



< 



(1 



be i.i.d. zero- 



(13) 



for /3 with < /3 < 32ep, then MOMP applied to 
yW = A^x' 1 ' ,i = 0,..., d— 1, recovers the correct solution 
x' ', ...,x^ d_1 ' with probability at least 



1 -'"<"-H " ' '^2^W 



(14) 



where c(S, A) is a constant that depends on the AW, but is 
independent of d. 

Remark: The constant c(S, A) can be lower-bounded 
in terms of the m(S) and Si(S), see [14]. 

The main implication of Theorem [2] is that, provided 
( fT3l is satisfied, the probability that MOMP fails decays 
exponentially in the number of measurements d. This has 
been shown before in [10], [11] for the MMV case, under 

(i) 

the assumption of i.i.d. Gaussian x^ . The implications of 
Theorem [2] concerning improvements over the worst-case 
results and over the MMV case are as discussed above, for 
LOPT. Furthermore, as in the case of LOPT, Theorem [2] can 
be strengthened for certain distributions. For example, when 
the entries of x£\ . . . , x^ d 1 ' are i.i.d. standard Gaussian, 
Theorem [2] holds with Condition (13[ replaced by 



Ed-l ( 
i=o yi-Si 



Og) 

(5) 



Eto 1 ( 



< 



(i 



/?)V 



l - 



1-5,(5) 



(1 + /?) 2 



(15) 



for (3 > 0, where ? > 1 is a constant that tends to 1 as d 
grows, and (TBI replaced by 

l~2 s ((n- s)exp(-d(3 2 c(S,A)) + exp(-^\ 2 c(5, A))) . 

(16) 

We finally note that condition ( fT3l is slightly stronger than 
condition ^ pertaining to LOPT [14]. 

B. The noisy GMMV problem 

We next present our results for the noisy GMMV problem 
and start with the probabilistic analysis of POPT. For the 
following result, we assume that the entries of x^°\ x^ -1 - 1 
are i.i.d. Rademacher random variables, i.e., they take on 
the values +1 and —1 with equal probability. We chose 
this model for convenience and note that similar results can 
be obtained for the sub-Gaussian case. The corresponding 
analysis is, however, much more cumbersome and does not 
yield additional insights. 

Theorem 3: Fix S C {0, ...,n — 1}, with cardinality 
s := \S\, and take the entries of x^, ...,x^ d ^ G R s to 
be i.i.d. Rademacher. Suppose the measurement matrices 
A(°\...,A( d -V G R mx " satisfy conditions © and © for 
a < 1 and some 7 > 0. Suppose the noise level e in and 
7 satisfy 



c 3 e + 7C 4 v|5| 



■0 



< Vd{ 1- -ci-£ 



(17) 



where c\, C2, C3, and C4 are constants depending on S max (S) 
and fj, max (S) only. Then, for ^ > such that max{l — 
16e,a 2 } < i^ 2 < a 2 (l + 16e), with probability at least 



1 — exp 



a 



2\2 



512e 2 7 2 a 2 

the solution to POPT applied to yW = A (l) x (i \i 
0, ...,d— 1, and denoted by x( ) 
S and satisfies 

1/2 



(18) 



1)^ j s SU pp 0r ]; ec i on 



Ell* 



< C3£ + ^CA\f\S\ 



(19) 



The main implication of Theorem [3] is that, under certain 
conditions on the A^ and for the noise level e sufficiently 
small, the probability that POPT produces a solution with 
correct support set that is "close" in ^2-norm to the true xW, 
tends to 1 exponentially fast in d. This result is also new for 
the MMV case. Condition ( fTTT i ensures that the noise level e 
is sufficiently small. Note that (fTTT i depends on the "worst" 
measurement matrix through S max (S) and /Umax (<->)• This is 
sensible as noise has the largest effect on the measurement 
y( 1 ' taken through the "worst" measurement matrix. 

We finally turn to the performance of MOMP. This result 
will be stated for i.i.d. sub-Gaussian x^ . 

Theorem 4: Fix S C {0, n — 1} with cardinality s := 
|iS|, and let the measurement matrices AW, A*- d_1 J G 
K mx " have unit norm columns with fi mllx (S) < 1 and 



<Smax(S) < 1. Let the entries of x^ 0) , ...,x^ 1} e W be 
i.i.d. zero-mean p-sub-Gaussian with unit variance. Suppose 
that 

e < 1 — S max (S) ^ 

(5) + (1 - 6 max (S))fi max (S) 

for some x > 0. If 

for /3 satisfying < f3 < 32ep, then with probability at 
least <HD, MOMP applied to y« = A«x«,i = 0,...,d- 
1, yields an estimate of the x™, denoted by x^, that is 
supported on S and satisfies 

(d-X \ 1 / 2 

V||x«-xW 2 I < 1 + ( *™*( 5 ) c (22 ) 
y x 2 J -i-^ ax (5) e - (22) 

Again, the main implication of Theorem [4] is that, under 
certain mild conditions on the and for the noise level 
e sufficiently small, the probability that MOMP produces a 
solution with correct support set that is "close" to the true 
x^, tends to 1 exponentially fast in d. This was shown in 
[11] for the MMV case and for i.i.d. Gaussian x^ . Note 
that for e = 0, i.e., in the noiseless case, OTT i reduces to 
(fT3l > and Theorem |4] reduces to Theorem |2] For e > 0, and 
hence x > 0, (ETb is more restrictive than Condition (TT~3T > - 
Condition (f20b depends on the "worst" measurement matrix, 
and ensures that the noise level e is sufficiently small. The 
constants in Theorem (0]i can be improved for i.i.d. Gaussian 
xg [14]. 

We conclude by noting that the results in this paper extend 
straightforwardly to the case of complex A^ and x^. 
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